"Learning about the crowd behind OpenStreetMap through interactive visualization of the project history." By: Sterling Quinn. >> This talk will be somewhat similar to the last talk open on OpenStreetMap analytics. But I guess the big question I have when I look at a map, how many people made this map and who are they? This talk is a little more placed based. But it's very similar in nature in the previous one in terms of trying to visualize open stream after looking at the project history. My name is Sterling Quinn, I just finished up some graduate work at Penn State University, that's where most of this work is completed. And its I'll be starting at central Washington University as an assistant professor in the geography department there. So if you're from the Seattle area, you'll be in touch and around in the future. So what we're doing is looking at OpenStreetMap history, what happened in this project. Particularly who was involved. And then you might be familiar that OpenStreetMap publishes these big history dump files. There are two that are utilized here. One of the files has the actual geometries in it and tags. The other one is the change history, which has the editor comments. And then you can then link to the edits and learn a little bit about the motivation behind the edits. So those are the two files that are used in this analysis. But this is -- talk more about data visualization, not so much about data processing. OSM history data, it's fundamental ML. It's really big, but we can -- we have ways to deal with that. And as you saw in the previous talk, there's a lot of really smart people working on frameworks for calculating all kinds of statistics and extracting this history. So I'm not going to focus on that part. What I will talk about is how to visualize and more than that how to make sure of what's going on in history, thinking about what we want to learn from this. And how can we make it interactive so that we could learn even more? This is related to research called visual analytics, also grew up around data. Partially a lot of funding from the U.S. Federal Government in the years after 9/11. After that time we had a lot of surveillance and sensors bringing in lots of information that needed to be processed in realtime and then this led to a lot of academic research on how to visualize all of this data that was coming in. But since that time, visual analytics has been applied to a lot of other things. For example, understanding crowd source projects because crowd source projects like Wikipedia and OpenStreetMap we have tons of data coming in all the time, and we want to make sense of it. These are different screen shots of visualization projects that have been understand in crowd source data, the older one looked at Wikipedia articles. So each strand is a section of article, and we can see when it was added or deleted or when edits occurred and so on. Lower image there is by Braedes and learner, show how Wikipedia interacts with each other and support each edit. And then a tool that came out several years ago called OS Matrix contribution of time periods in Europe very nice tool that's still available to browse online. And then the previous talk you saw some exciting analytics work that's going on in OpenStreetMap analytics that will be exciting to see how that grows. When we design visual analytics tool or application, it's good to start with some questions in mind that we want to answer. And so some of the questions that I've had over several years of research of OpenStreetMap is how big is the crowd? So if I look at the map, how many people built that? Particularly in places that don't always get a lot of attention. So maybe smaller cities. Countries, honestly outside the U.S. It's interesting to look around the world and see how the size of the crowd varies and how many are local mappers that have had a chance to do local surveys in the area. What is the degree of influence or institutionally supported contributors? So we have a lot of companies, governments, geos who are supporting OpenStreetMapers encouraging people to map or paying people to map. So what is the level of that influence? How much influence can a single power contributor have? Some of you are power contributors you've mapped your entire town. How does that affect the map? And then on the bottom we notice there are a lot of people who come to the project and map once one day or one place and don't ever map again. What were they doing and what can we know about that? Finally a huge project that is focused on the OSM update is how much do I feel I can trust OSM for my area of interest? Those in technology have to ask themselves this question and sometimes they fight for advocate for use of data and people ask them this. So can it help us understand how much we can trust OSM data and if we can visualize the history? So I'm going to demonstrate a tool that I worked on called Crowd Lens for OpenStreetMap. It's visual analytics tool for making sense of the crowd. This was built as part of my study at Penn State University and also had help from a undergrad research assistant Greg who did excellent prototyping work and doing his career in GIS. So I appreciate their assistance. I'm going to do something brave that I don't think I have done State of the Map, which is try to run a live demo off the Internet. So we'll see how that goes. Whoa let me go back. I want you to know that you can try out this tool. Maybe wait until after this is complete. I don't know how Bluehost is. Just running on a little cloud service. But anyway I wanted people to play with this and look at it yourselves. You're welcome to visit this site. I recommend -- it was designed for a larger display, so I recommend either a laptop or desktop monitor and the performance you have is dependent on the speed of your Internet connection I think we have a good connection here, so we'll give this a try. So when this tool loads, it's placed based as I mentioned. So I've preprocessed the history and loaded it in for several places, which you can select from a drop down list up here. Originally my research and when I talk about last year State of the Map was looking at how. Map has grown in smaller sized cities that may not -- may fly under the radar. And I think that's a good barometer of OpenStreetMap help. Get outside the areas that are attracting a lot of attention. How can I tell if I'm achieving coverage in my country? Maybe sometimes looking at the smaller cities can help us know a little bit more about that. So I've loaded in six cities here that were between 50- and 100,000 population so they're all about the same size and then after that loaded in urban neighborhoods just to show that this tool is more of a proof of concept than anything and how can it work in more urban places, including Seattle. So first place it appears on here is the Seattle center neighborhood. That's over by where the Space Needle is, KeyArena, and so on. When you see up in the middle is OpenStreetMap map in the background and the overlay here is one user's work. And down here are small maps of each user whosever contributed in this place. In this part of Seattle. That number is 84 contributors by my count as of the end of 2015. And these are sorted right now by the number of days active in the project. And you can click on a contributor and see a little bit more about them over on the right-hand side. Is this contributor here? Seattle FYI? They've done a lot of work up here. You can see what they do during the past four years and this shows the types of tags that they tend to add and then you can look down here and see some of the comments. And we can do this for any user. So click over, here's another user, they moved a helipad if we want to know a little bit more about this edit. It's highlighted when we click on the map and then a link to the map on the table as well so we can look at it and see what the comment was. As we look through the map and the patterns, it's interesting sometimes there's anomalies or things that stand out we want to investigate a little bit more. For example, bot activity tends to jump out as the edits are uniform. This is some work was done by a bot, and we can see what the bot was doing. Other patterns like this one, this is kind of cool. We don't -- we can learn about this a little bit more, I believe this was done by purple heart. One thing that I know is that this user only made six change sets in the project and this is one of them. So this is a type of user that just gained OpenStreetMap for a short amount of time and we have to figure out how to come back to the map and look at this project and say, hey, I want to do a project like that too or build and it can give us some ideas. To demonstrate a couple of other capabilities of this tool, I'll switch to another area. This is going to be in the public eye pretty soon. In Brazil. This is the area of Rio. I show that there's small maps down here representing each user. And these can be sorted by different criteria. So let's sort by the number of change sets in all of OpenStreetMap. And this opens people who have made tens of thousands of edits up to the top. Now, not surprisingly these have done very little work here in this area just because this is one of many places around the world that they tend to map. If we want to find users who just focus mainly on this area, we could sort by percentage of the user change sets in this place. And this brings to the top a lot of people who have just edited here and nowhere else. And some of those one-time contributors are people who just map a few things. And reading through the comments can be informative. Sometimes we don't have comments but sometimes they wrote detailed things. This person wrote in Portuguese a grocery store they attend on a weekly basis. This person made four total change sets in the project. This is the type of information that could really enrich the map we want to support. There's other interesting ones on here and reasons people come to the map. This one caught my eye. They added a local business to the map. And the name of this user was oasis collection. So I did a little online research and found that this is a person who put their own business on the map and created an account for that purpose. So they wanted to make sure they show up on the map and they have a presence there. And I'll talk a little bit more about that later in the talk. If we go to another nearby place in South America, this is one of the smaller cities Argentina. Looking here, you might notice that these little maps have different bands of color around them. What I did was read through the comments, and you can see people use different languages to say -- as they comment, this person used Spanish in almost all their edits. Used automated identification language software to identify the language of the comments and then that's what this band of color represents. So the ones with the pink bands, these are people who tend to comment in Spanish, the blue ones are in English. Now, this is not a perfect indicator of where the person was from but it does tell you if they know the native language and more than likely be from the area. There's filters over here where you can filter down the language. So out of 41 total contributors in this town, Spanish was detected for 19 of them. And then nine of them favored English. So in this particular town, there's a heavy balance of people using the local language. Another thing is that the map itself is a filter. So as we zoom in, this number changes, and we see only the number of people who affected the map view. So this really can answer their question how many people made the map in this place as we zoom in panorama. If we go to the center square of town, there's been a lot of attention here, 15 out of this 41 people have been edited here. What's interesting to me is as you pan out of this downtown region, the list really narrows and notice that all of these images have pink bands. So those are people who tended to comment in Spanish and there's probably more local influence on the map outside of that centered area. That's one reason why I loaded some touristy areas in the map to do further analysis on that. We can look at other places to look at the nature of the crowd. So here's a place in Australia. And I've shown how we can look at the details for individual contributors as we click these images. Over on the left, though, are some filters that can be used to understand the nature of the crowd as a whole. And maybe try to find out who are the people who would be most likely to have that local influence? So we could filter, for example, to find the most active recent mappers and maybe those are people we could follow up with to see if they are from the area. If we filter this down to just people who are active at least five days here, that list goes down from over 80 contributors to just four. So while there may be apparent of a lot of mapping here, that number quickly goes down as we apply these filters. If we -- if we filtered to just those people that were active in the last two years, it goes down to three. And then one thing that I've done is if it was detected that the user has created a profile or wiki page, there's a hyper link, so we can load this and learn a little bit more about the person and where they're from. This person happens to be from Australia. A final thing I would like to show here is just the different levels of coverage in these cities around the world. So if we go to the city about this size, this is one in Ghana I chose for analysis. There's been 14 contributors here in this area. And one that looks like they've done -- they've added quite a few nodes. We can drill into a little bit more depth of who mapped what by clicking the tag cloud. So, for example, if we click highway, we see 12 out of people worked on highways. Or amenities, fewer people worked on those. So these are examples that can be learned about individuals using this tool as well as a crowd as a whole. And, for me, I would like to help answer some of the questions I had about OpenStreetMap but then I had a few more questions after using this tool. Maybe some come to your mind as well. One of them is -- one of those questions is related to a talk that was given last year by Alex from Mapbox and his talk was called the paid mappers are coming. And when I looked at the data, I found the paid mappers have come, so this is true. And there's a lot of institutionally supported mappers that are working on the map. For example, Johnstown, Pennsylvania if you filtered to just people who edited in 2015, I found 14 out of 34 people were working for the Mapbox data team on improving the data. And I knew that because they have a policy of clearly identifying themselves in their profiles, which I think is healthy. But there is a lot of institutionally supported mapping whether it's people mapping on behalf of governments or companies or other entities. And I think it's healthy to think about the effects of that. I think there's positive and -- positive effects and maybe some challenges. It's definitely helping the map quality I believe. It's a good thing for the map to have additional people working on it, and it might improve the trust that we can put in the map. I would certainly trust a map more if I knew more eyes were on the map. Especially ones that were professional mappers. The challenges may be -- it would be healthy to look at a different changing community dynamics that would happen by introducing new groups of contributors into the map and just making sure the local influence could continue to have a place in the map. These are the kinds of questions that are great to discuss in this conference. On this talk I don't mean to come down on any one side or the other, but I think this is a good form here to discuss how do we feel about this? And how does it affect the project? Another one is that's interesting is how do we feel about the self promotion? So I showed an example of somebody who put their business on the map, and they created an account just for that purpose. What's interesting to me is that in Wikipedia this would be taboo. You can't write an article about yourself and promote your business, that's against what they do but in OpenStreetMap we don't have any sort of -- I don't know what type of policy there is about that or how we feel about that at all. I think it could be useful for somebody who wants to map themself and put themself on the map. When it's done in such a way that it seems like advertising. We didn't even think about how we feel about that in the project. So that's something to discuss as a community moving forward. Finally how does knowing a contributor history affect our trust of OSM? So returning to the question that I had in large text, can I trust OpenStreetMap better? We showed this tool to attend different professionals who attend geo spatial data in their work, and we ask them after using it, how did this change -- or how did this affect your perception of your quality of OpenStreetMap data? About half the people said it didn't affect it at all. They argued it staked out their opinion of OpenStreetMap data. A lot of them used the data, but they were often suspicious of errors creeping in, so they continued to use it with those caveats in mind. Other people said that their perception of the data quality was improved. And nobody said that it -- they became less confident after seeing history. And this was a surprise because I thought maybe seeing some of those smaller crowds as you zoom in and go to peripheral areas of the map out of the downtown and the more frequented places people would get suspicious about OSM quality. On the contrary, people were comforted by the presence of those power mappers. So they would see those individuals that had made -- spent hundreds of days editing the map of the place, and that made them feel better about quality. And seeing the history was helpful to a number of people, regarding confidence in OpenStreetMap data. Now, this tool was created just as kind of an experiment or proof of concept to show what things could be viewed from the history. I hope it sparked different ideas and discussion within the community. And, again, I invite you to visit the tool and to provide feedback. And happy to take questions at this time. Thank you. [Applause] >> Just procedural question, what did you use for your auto language detection? >> Good question. So what did I use for the language detection? I should have mentioned that. >> And how happy were you with it? >> So I used open source module lang pi. That was developed by some academic researchers and I was happy with it. We did a little test. We took several years ago we were analyzing areas in Spanish, English, Portuguese and in our lab we had several people who could read those labs and just quality checked ourselves against what it said. We had about a 97% success rate with it, so we were pretty happy with it. It's certainly not perfect and one thing that gets tricky is people use multiple languages, so a lot of times they'll use English and other places they'll use their native language. That's why in this analysis we've looked at all their comments and hit the most prevalent one, rather than the one that was being used locally. I want to make sure I don't go over time here. >> I have a question about having data that's more crowd sourced versus coming from a official source like someone who's a data owner like owning an agency or something. I was wondering if you could speak about that if there's mechanisms to know official data from an organization. >> Yeah. So the lines between OpenStreetMap that's contributed by hobbyists or official data are starting to get board because we're seeing a lot of data in OpenStreetMap, first of all. And we're also seeing a lot of mapping in places that have never had much mapping, visual mapping and OpenStreetMap is it. So then it becomes to be adopted as official. So there's a talk in the other building earlier today where it showed the government was sharing OpenStreetMap data as the authoritative data because that was the most data they had collected there. So I think we're seeing those lines starting to blur. The real thing I pick up in my research is that OpenStreetMap is a real mix of all kinds of stuff. There's imports, there's paid mappers, there's bots, hobbyists, one time contributors, and we to need keep that in mind whenever we make statements about the data quality or the statistics about how much data as a whole has come into the project. Maybe one more? Yeah, go ahead. >> Might be a little bit specific but I'm curious doing a bunch of this analysis, and you have this influx of paid mappers and an influx of bots so your underlying distribution if you will is shifting. How did that play in your analysis? How did you -- could you compensate for that? Did you? >> The question is does the distribution of paid mappers and bots change over time, how does that affect my analysis? I guess -- well, the approach I wanted to take in the things that I was writing and the questions that I was asking is I just wanted to go beyond simple counting up nodes or users and to get down a little deeper as to what was behind that. I think a real key to achieving that is to read through the change. Which are in that little table in the lower right hand corner of the application. I didn't show a lot of those here, but I did spend hours going through those comments reading what people gave, and it's helpful in getting more rounded picture of who's contributing where. Now, if you just look at raw numbers over time, that it's going to be obscured a little bit the full story about how OpenStreetMap is changed a little bit from more of a hobbyist-type project to one where a lot of people are stakeholders in a project and there's a lot of influenced data from different institutions. We should always keep that in mind. Thank you for the questions, and I'm happy to talk any time later in the conference about this project. [Applause] >> All right. We have a half hour break for coffee and tea.